Extraction of Russian Sentiment Lexicon for Product Meta-Domain
نویسندگان
چکیده
In this paper we consider a new approach for domain-specific sentiment lexicon extraction in Russian. We propose a set of statistical features and algorithm combination that can discriminate sentiment words in a specific domain. The extraction model is trained in the movie domain and then utilized to other domains. We evaluate the quality of obtained sentiment vocabularies intrinsically. Finally we combine the sentiment lexicons from five domains to obtain one general lexicon for the product meta-domain. We demonstrate the robustness of the extracted lexicon in the cross-domain sentiment classification in Russian. TITLE AND ABSTRACT IN RUSSIAN Извлечение Cловаря Оценочной Лексики на Русском Языке для Мета-Области Товаров В данной работе рассматривается новый подход к извлечению предметноориентированного словаря оценочной лексики на русском языке. Мы предлагаем использовать совокупность статистических и лингвистических признаков, позволяющих выявлять оценочные слова, и комбинировать эти признаки с помощью алгоритмов машинного обучения. Модель извлечения создается для предметной области фильмов, а затем применяется в других предметных областях. Мы оцениваем качество полученных словарей оценочных слов посредством ручной разметки. Наконец, мы собираем из отдельных словарей общий словарь оценочных слов, рассматривая его как оценочный словарь в широкой области товаров. Мы демонстрируем полезность полученного общего лексикона в задаче переноса модели анализа тональности с одной области на другую для отзывов пользователей на русском языке.
منابع مشابه
DomEx: Extraction of Sentiment Lexicons for Domains and Meta-Domains
In this paper we describe a DomEx sentiment lexicon extractor, where a new approach for domain-specific sentiment lexicon extraction is implemented. Sentiment lexicon extraction is based on the machine learning model comprising a set of statistical and linguistic features. The extraction model is trained in the movie domain and then can be utilized to other domains. The system can work with var...
متن کاملУточнение русскоязычных словарей эмоциональной лексики с использованием тезауруса RuThes (Refinement of Russian Sentiment Lexicons Using RuThes Thesaurus)
The paper describes a combined approach to extraction of a domain-specific sentiment lexicon. At first, an initial version of a domainspecific lexicon is obtained by application of a supervised model. At the second stage, the ordered list of sentiment words is refined using the thesaurus information. This combined model is applied to several domains and at last the domain-specific sentiment lex...
متن کاملA Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملCreating a General Russian Sentiment Lexicon
The paper describes the new Russian sentiment lexicon RuSentiLex. The lexicon was gathered from several sources: opinionated words from domain-oriented Russian sentiment vocabularies, slang and curse words extracted from Twitter, objective words with positive or negative connotations from a news collection. The words in the lexicon having different sentiment orientations in specific senses are ...
متن کاملTwo-Step Model for Sentiment Lexicon Extraction from Twitter Streams
In this study we explore a novel technique for creation of polarity lexicons from the Twitter streams in Russian and English. With this aim we make preliminary filtering of subjective tweets using general domain-independent lexicons in each language. Then the subjective tweets are used for extraction of domain-specific sentiment words. Relying on co-occurrence statistics of extracted words in a...
متن کامل